林嶔 (Lin, Chin)
Lesson 7
– 請至這裡下載本週的範例資料
dat = read.csv("Example data.csv", header = TRUE)
head(dat)
## eGFR Disease Survival.time Death Diabetes Cancer SBP DBP
## 1 34.65379 1 0.4771037 0 0 1 121.2353 121.3079
## 2 37.21183 1 3.0704424 0 1 1 122.2000 122.6283
## 3 32.60074 1 0.2607117 1 0 0 118.9136 121.7621
## 4 29.68481 1 NA NA 0 0 118.2212 112.7043
## 5 28.35726 0 0.1681673 1 0 0 116.7469 115.7705
## 6 33.95012 1 1.2238556 0 0 0 119.9936 116.3872
## Education Income
## 1 2 0
## 2 2 0
## 3 0 0
## 4 1 0
## 5 0 0
## 6 1 0
– 最常見的Y分別為:
連續變項 - 假設誤差為常態分佈 (這就是線性迴歸)
二元變項 - 假設機率服從邏輯斯分布 (這就是邏輯斯迴歸)
Formula1 = "eGFR ~ factor(Diabetes) + SBP + factor(Income)"
Result1 = glm(Formula1, family = "gaussian", data = dat)
Result1
##
## Call: glm(formula = Formula1, family = "gaussian", data = dat)
##
## Coefficients:
## (Intercept) factor(Diabetes)1 SBP
## -44.68079 -0.03554 0.64674
## factor(Income)1 factor(Income)2
## 0.32995 1.18211
##
## Degrees of Freedom: 896 Total (i.e. Null); 892 Residual
## (103 observations deleted due to missingness)
## Null Deviance: 45610
## Residual Deviance: 2669 AIC: 3536
Result2 = summary(Result1)
Result2
##
## Call:
## glm(formula = Formula1, family = "gaussian", data = dat)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -5.4186 -1.1374 -0.0075 1.0652 6.0488
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -44.680787 0.705263 -63.353 < 2e-16 ***
## factor(Diabetes)1 -0.035537 0.162866 -0.218 0.827
## SBP 0.646741 0.005857 110.423 < 2e-16 ***
## factor(Income)1 0.329949 0.193766 1.703 0.089 .
## factor(Income)2 1.182107 0.186438 6.340 3.63e-10 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 2.992399)
##
## Null deviance: 45608.2 on 896 degrees of freedom
## Residual deviance: 2669.2 on 892 degrees of freedom
## (103 observations deleted due to missingness)
## AIC: 3535.7
##
## Number of Fisher Scoring iterations: 2
dat[,"Diabetes"] = as.factor(dat[,"Diabetes"])
dat[,"Income"] = as.factor(dat[,"Income"])
glm(dat[,"eGFR"] ~ ., family = "gaussian", data = dat[,c("Diabetes", "SBP", "Income")])
##
## Call: glm(formula = dat[, "eGFR"] ~ ., family = "gaussian", data = dat[,
## c("Diabetes", "SBP", "Income")])
##
## Coefficients:
## (Intercept) Diabetes1 SBP Income1 Income2
## -44.68079 -0.03554 0.64674 0.32995 1.18211
##
## Degrees of Freedom: 896 Total (i.e. Null); 892 Residual
## (103 observations deleted due to missingness)
## Null Deviance: 45610
## Residual Deviance: 2669 AIC: 3536
– 我們挑幾個比較重要的來講…
Estimate: 這是斜率的估計值,一般來說我們檢定他是否等於0,若等於0則代表該因子不影響Y的變化。
Std. Error: 這是斜率的估計值的標準誤,95% ci是用他來計算的。
t value: 這是檢定Estimate是否顯著的檢定值,他等於Estimate/Std. Error。
Pr(>|t|): 這是t value相對應的p value。
– 以上這些數值可以利用函數「ls()」找找看在哪裡…
COEF = Result2$coefficients
COEF
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -44.68078691 0.705262678 -63.3533977 0.000000e+00
## factor(Diabetes)1 -0.03553692 0.162865536 -0.2181979 8.273248e-01
## SBP 0.64674071 0.005856955 110.4226792 0.000000e+00
## factor(Income)1 0.32994860 0.193765940 1.7028204 8.895005e-02
## factor(Income)2 1.18210745 0.186438265 6.3404766 3.634238e-10
my.pvalue = pt(abs(COEF[,3]), df = Result2$df[2], lower.tail = FALSE) * 2
all.equal(my.pvalue, COEF[,4])
## [1] TRUE
qnorm(0.975)
## [1] 1.959964
pnorm(1.96)
## [1] 0.9750021
qchisq(0.975, df = 1)
## [1] 5.023886
pchisq(3.84, df = 1)
## [1] 0.9499565
m = COEF[,1]
s = COEF[,2]
#計算下界
ci.low = m - qt(0.975, df = Result2$df[2]) * s
ci.low
## (Intercept) factor(Diabetes)1 SBP factor(Income)1
## -46.06495450 -0.35518122 0.63524569 -0.05034167
## factor(Income)2
## 0.81619867
#計算上界
ci.up = m + qt(0.975, df = Result2$df[2]) * s
ci.up
## (Intercept) factor(Diabetes)1 SBP factor(Income)1
## -43.2966193 0.2841074 0.6582357 0.7102389
## factor(Income)2
## 1.5480162
m.3 = formatC(m, 3, format = "f")
ci.low.3 = formatC(ci.low, 3, format = "f")
ci.up.3 = formatC(ci.up, 3, format = "f")
paste0(m.3, " (", ci.low.3, " to ", ci.up.3, ")")
## [1] "-44.681 (-46.065 to -43.297)" "-0.036 (-0.355 to 0.284)"
## [3] "0.647 (0.635 to 0.658)" "0.330 (-0.050 to 0.710)"
## [5] "1.182 (0.816 to 1.548)"
Result3 = glm(dat[,"Disease"] ~ ., family = "binomial", data = dat[,c("Diabetes", "SBP", "Income")])
Result3
##
## Call: glm(formula = dat[, "Disease"] ~ ., family = "binomial", data = dat[,
## c("Diabetes", "SBP", "Income")])
##
## Coefficients:
## (Intercept) Diabetes1 SBP Income1 Income2
## -0.623638 0.353512 0.013074 -0.007878 0.057746
##
## Degrees of Freedom: 876 Total (i.e. Null); 872 Residual
## (123 observations deleted due to missingness)
## Null Deviance: 1013
## Residual Deviance: 1005 AIC: 1015
Result4 = summary(Result3)
Result4
##
## Call:
## glm(formula = dat[, "Disease"] ~ ., family = "binomial", data = dat[,
## c("Diabetes", "SBP", "Income")])
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.9365 -1.5102 0.7647 0.8141 0.9526
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -0.623638 0.936715 -0.666 0.5056
## Diabetes1 0.353512 0.232143 1.523 0.1278
## SBP 0.013074 0.007823 1.671 0.0947 .
## Income1 -0.007878 0.259658 -0.030 0.9758
## Income2 0.057746 0.257617 0.224 0.8226
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 1013.4 on 876 degrees of freedom
## Residual deviance: 1005.0 on 872 degrees of freedom
## (123 observations deleted due to missingness)
## AIC: 1015
##
## Number of Fisher Scoring iterations: 4
線性迴歸是使用t value進行檢定,而邏輯斯迴歸是使用z value進行檢定
邏輯斯迴歸比較不常使用『斜率』以及其95% CI作結果的呈現,而較常呈現『勝算比(OR, Odds ratio)』以及其95% CI
COEF = Result4$coefficients
#計算p value(使用函數「pnorm()」)
my.pvalue = pnorm(abs(COEF[,3]), lower.tail = FALSE) * 2
all.equal(my.pvalue, COEF[,4])
## [1] TRUE
#計算95% CI
m = COEF[,1]
s = COEF[,2]
ci.low = m - qnorm(0.975) * s
ci.up = m + qnorm(0.975) * s
#轉變為OR (這是唯一增加的部分)
m.OR = exp(m)
ci.low.OR = exp(ci.low)
ci.up.OR = exp(ci.up)
#標準化輸出
m.OR.3 = formatC(m.OR, 3, format = "f")
ci.low.OR.3 = formatC(ci.low.OR, 3, format = "f")
ci.up.OR.3 = formatC(ci.up.OR, 3, format = "f")
paste0(m.OR.3, " (", ci.low.OR.3, " to ", ci.up.OR.3, ")")
## [1] "0.536 (0.085 to 3.361)" "1.424 (0.903 to 2.245)"
## [3] "1.013 (0.998 to 1.029)" "0.992 (0.596 to 1.650)"
## [5] "1.059 (0.639 to 1.755)"
2_1. 使用者輸入的依變項若為連續變項,則進行線性回歸運算
2_2. 使用者輸入的依變項若為類別變項,則先確認是否僅有0與1兩個類別,確認成功才進行運算
運算結束後,計算點估計及95% CI
將運算出來的95% CI擴充在原來的係數矩陣右邊,呈現類似這樣的狀況:
## Estimate Std. Error z value Pr(>|z|) OR (95% CI)
## (Intercept) "-0.624" "0.937" "-0.666" "0.506" "0.536 (0.085 to 3.361)"
## Diabetes1 "0.354" "0.232" "1.523" "0.128" "1.424 (0.903 to 2.245)"
## SBP "0.013" "0.008" "1.671" "0.095" "1.013 (0.998 to 1.029)"
## Income1 "-0.008" "0.260" "-0.030" "0.976" "0.992 (0.596 to 1.650)"
## Income2 "0.058" "0.258" "0.224" "0.823" "1.059 (0.639 to 1.755)"
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) "-44.681" "0.705" "-63.353" "<0.001"
## factor(Diabetes)1 "-0.036" "0.163" "-0.218" "0.827"
## SBP "0.647" "0.006" "110.423" "<0.001"
## factor(Income)1 "0.330" "0.194" "1.703" "0.089"
## factor(Income)2 "1.182" "0.186" "6.340" "<0.001"
## Slope (95% CI)
## (Intercept) "-44.681 (-46.065 to -43.297)"
## factor(Diabetes)1 "-0.036 (-0.355 to 0.284)"
## SBP "0.647 (0.635 to 0.658)"
## factor(Income)1 "0.330 (-0.050 to 0.710)"
## factor(Income)2 "1.182 (0.816 to 1.548)"
my_func.glm = function (y, x_data) {
lvl.y = levels(factor(y))
n.lvl.y = length(lvl.y)
if (n.lvl.y < 10) {
if (mean(lvl.y %in% 0:1) == 1) {
Result = glm(y ~ ., family = "binomial", data = x_data)
summary_Result = summary(Result)
COEF = summary_Result$coefficients
m = COEF[,1]
s = COEF[,2]
#計算下界
ci.low = m - qnorm(0.975) * s
#計算上界
ci.up = m + qnorm(0.975) * s
#轉變為OR (這是唯一增加的部分)
m.OR = exp(m)
ci.low.OR = exp(ci.low)
ci.up.OR = exp(ci.up)
#標準化輸出
m.OR.3 = formatC(m.OR, 3, format = "f")
ci.low.OR.3 = formatC(ci.low.OR, 3, format = "f")
ci.up.OR.3 = formatC(ci.up.OR, 3, format = "f")
COEF = cbind(COEF, paste0(m.OR.3, " (", ci.low.OR.3, " to ", ci.up.OR.3, ")"))
colnames(COEF)[5] = "OR (95% CI)"
COEF
} else {
cat("y為類別變項時僅能接受0與1的類別變項\n")
}
} else {
Result = glm(y ~ ., family = "gaussian", data = x_data)
summary_Result = summary(Result)
COEF = summary_Result$coefficients
m = COEF[,1]
s = COEF[,2]
#計算下界
ci.low = m - qt(0.975, df = summary_Result$df[2]) * s
#計算上界
ci.up = m + qt(0.975, df = summary_Result$df[2]) * s
m.3 = formatC(m, 3, format = "f")
ci.low.3 = formatC(ci.low, 3, format = "f")
ci.up.3 = formatC(ci.up, 3, format = "f")
COEF = cbind(COEF, paste0(m.3, " (", ci.low.3, " to ", ci.up.3, ")"))
colnames(COEF)[5] = "Slope (95% CI)"
COEF
}
}
dat[,"Income"] = as.factor(dat[,"Income"])
my_func.glm(y = dat$Education, x_data = dat[,c("SBP", "Income")])
## y為類別變項時僅能接受0與1的類別變項
my_func.glm(y = dat$Disease, x_data = dat[,c("SBP", "Income")])
## Estimate Std. Error
## (Intercept) "-1.21299612535807" "0.87057456391603"
## SBP "0.018343099853456" "0.00717946299007011"
## Income1 "-0.00331543024980331" "0.254325232647553"
## Income2 "0.11049049150071" "0.25577927092588"
## z value Pr(>|z|)
## (Intercept) "-1.39332824049183" "0.163520554560428"
## SBP "2.5549403735107" "0.0106205989530202"
## Income1 "-0.0130361829036361" "0.989598925526899"
## Income2 "0.431975941993863" "0.665758898626829"
## OR (95% CI)
## (Intercept) "0.297 (0.054 to 1.638)"
## SBP "1.019 (1.004 to 1.033)"
## Income1 "0.997 (0.605 to 1.641)"
## Income2 "1.117 (0.676 to 1.844)"
my_func.glm(y = dat$eGFR, x_data = dat[,c("SBP", "Income")])
## Estimate Std. Error t value
## (Intercept) "-44.5446073390111" "0.650718644728415" "-68.4544813643726"
## SBP "0.645468680757481" "0.00532053317500706" "121.316540941712"
## Income1 "0.320945790901737" "0.189850880461012" "1.69051515653964"
## Income2 "1.18312335438857" "0.183056090544938" "6.46317394229569"
## Pr(>|t|) Slope (95% CI)
## (Intercept) "0" "-44.545 (-45.822 to -43.268)"
## SBP "0" "0.645 (0.635 to 0.656)"
## Income1 "0.0912703998742093" "0.321 (-0.052 to 0.694)"
## Income2 "1.66581914375454e-10" "1.183 (0.824 to 1.542)"
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) "-44.681" "0.705" "-63.353" "<0.001"
## factor(Diabetes)1 "-0.036" "0.163" "-0.218" "0.827"
## SBP "0.647" "0.006" "110.423" "<0.001"
## factor(Income)1 "0.330" "0.194" "1.703" "0.089"
## factor(Income)2 "1.182" "0.186" "6.340" "<0.001"
## Slope (95% CI)
## (Intercept) "-44.681 (-46.065 to -43.297)"
## factor(Diabetes)1 "-0.036 (-0.355 to 0.284)"
## SBP "0.647 (0.635 to 0.658)"
## factor(Income)1 "0.330 (-0.050 to 0.710)"
## factor(Income)2 "1.182 (0.816 to 1.548)"
dat[,"Diabetes"] = as.factor(dat[,"Diabetes"])
dat[,"Income"] = as.factor(dat[,"Income"])
Result1 = glm(dat[,"eGFR"] ~ ., family = "gaussian", data = dat[,c("Diabetes", "SBP", "Income")])
Result2 = summary(Result1)
COEF = Result2$coefficients
COEF
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -44.68078691 0.705262678 -63.3533977 0.000000e+00
## Diabetes1 -0.03553692 0.162865536 -0.2181979 8.273248e-01
## SBP 0.64674071 0.005856955 110.4226792 0.000000e+00
## Income1 0.32994860 0.193765940 1.7028204 8.895005e-02
## Income2 1.18210745 0.186438265 6.3404766 3.634238e-10
VCOV = Result2$cov.scaled
VCOV
## (Intercept) Diabetes1 SBP Income1
## (Intercept) 0.497395445 0.0402153065 -4.110204e-03 -0.008966506
## Diabetes1 0.040215306 0.0265251829 -3.688336e-04 -0.001223519
## SBP -0.004110204 -0.0003688336 3.430393e-05 0.000040765
## Income1 -0.008966506 -0.0012235192 4.076500e-05 0.037545239
## Income2 -0.005350712 -0.0002702795 9.676645e-06 0.004237876
## Income2
## (Intercept) -5.350712e-03
## Diabetes1 -2.702795e-04
## SBP 9.676645e-06
## Income1 4.237876e-03
## Income2 3.475923e-02
#將第4項與第5項進行整合
variables = 4:5
DF = length(variables)
CHISQ = t(COEF[variables,1]) %*% solve(VCOV[variables,variables]) %*% COEF[variables,1]
PVALUE = pchisq(CHISQ, df = DF, lower.tail = FALSE)
PVALUE
## [,1]
## [1,] 1.169025e-09
#僅檢定第4項 (結果近似於使用t value進行檢定)
variables = 4
DF = length(variables)
CHISQ = t(COEF[variables,1]) %*% solve(VCOV[variables,variables]) %*% COEF[variables,1]
PVALUE = pchisq(CHISQ, df = DF, lower.tail = FALSE)
PVALUE
## [,1]
## [1,] 0.08860168
COEF[4,4]
## [1] 0.08895005
## $SUMMARY
## Global p value
## Diabetes "0.827"
## SBP "<0.001"
## Income "<0.001"
##
## $COEF
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) "-44.681" "0.705" "-63.353" "<0.001"
## Diabetes1 "-0.036" "0.163" "-0.218" "0.827"
## SBP "0.647" "0.006" "110.423" "<0.001"
## Income1 "0.330" "0.194" "1.703" "0.089"
## Income2 "1.182" "0.186" "6.340" "<0.001"
## Slope (95% CI)
## (Intercept) "-44.681 (-46.065 to -43.297)"
## Diabetes1 "-0.036 (-0.355 to 0.284)"
## SBP "0.647 (0.635 to 0.658)"
## Income1 "0.330 (-0.050 to 0.710)"
## Income2 "1.182 (0.816 to 1.548)"
my_func.glm = function (y, x_data) {
lvl.y = levels(factor(y))
n.lvl.y = length(lvl.y)
if (n.lvl.y < 10) {
if (mean(lvl.y %in% 0:1) == 1) {
Result = glm(y ~ ., family = "binomial", data = x_data)
summary_Result = summary(Result)
COEF = summary_Result$coefficients
m = COEF[,1]
s = COEF[,2]
#計算下界
ci.low = m - qnorm(0.975) * s
#計算上界
ci.up = m + qnorm(0.975) * s
#轉變為OR (這是唯一增加的部分)
m.OR = exp(m)
ci.low.OR = exp(ci.low)
ci.up.OR = exp(ci.up)
#標準化輸出
m.OR.3 = formatC(m.OR, 3, format = "f")
ci.low.OR.3 = formatC(ci.low.OR, 3, format = "f")
ci.up.OR.3 = formatC(ci.up.OR, 3, format = "f")
COEF = cbind(COEF, paste0(m.OR.3, " (", ci.low.OR.3, " to ", ci.up.OR.3, ")"))
colnames(COEF)[5] = "OR (95% CI)"
} else {
cat("y為類別變項時僅能接受0與1的類別變項\n")
}
} else {
Result = glm(y ~ ., family = "gaussian", data = x_data)
summary_Result = summary(Result)
COEF = summary_Result$coefficients
m = COEF[,1]
s = COEF[,2]
#計算下界
ci.low = m - qt(0.975, df = summary_Result$df[2]) * s
#計算上界
ci.up = m + qt(0.975, df = summary_Result$df[2]) * s
m.3 = formatC(m, 3, format = "f")
ci.low.3 = formatC(ci.low, 3, format = "f")
ci.up.3 = formatC(ci.up, 3, format = "f")
COEF = cbind(COEF, paste0(m.3, " (", ci.low.3, " to ", ci.up.3, ")"))
colnames(COEF)[5] = "Slope (95% CI)"
}
summary_result = matrix(0, ncol = 1, nrow = ncol(x_data))
rownames(summary_result) = colnames(x_data)
rownames(summary_result) = colnames(x_data)
num_COEF = summary_Result$coefficients
VCOV = summary_Result$cov.scaled
start_val = 2
for (i in 1:ncol(x_data)) {
if (class(x_data[,i]) == 'factor') {
end_val = start_val + length(levels(x_data[,i])) - 2
} else {
end_val = start_val
}
variables = start_val:end_val
DF = length(variables)
CHISQ = t(num_COEF[variables,1]) %*% solve(VCOV[variables,variables]) %*% num_COEF[variables,1]
summary_result[i,1] = pchisq(CHISQ, df = DF, lower.tail = FALSE)
start_val = end_val + 1
}
MY_LIST = list(summary_result, COEF)
MY_LIST
}
dat[,"Income"] = as.factor(dat[,"Income"])
my_func.glm(y = dat$Disease, x_data = dat[,c("SBP", "Income")])
## [[1]]
## [,1]
## SBP 0.0106206
## Income 0.9092092
##
## [[2]]
## Estimate Std. Error
## (Intercept) "-1.21299612535807" "0.87057456391603"
## SBP "0.018343099853456" "0.00717946299007011"
## Income1 "-0.00331543024980331" "0.254325232647553"
## Income2 "0.11049049150071" "0.25577927092588"
## z value Pr(>|z|)
## (Intercept) "-1.39332824049183" "0.163520554560428"
## SBP "2.5549403735107" "0.0106205989530202"
## Income1 "-0.0130361829036361" "0.989598925526899"
## Income2 "0.431975941993863" "0.665758898626829"
## OR (95% CI)
## (Intercept) "0.297 (0.054 to 1.638)"
## SBP "1.019 (1.004 to 1.033)"
## Income1 "0.997 (0.605 to 1.641)"
## Income2 "1.117 (0.676 to 1.844)"
my_func.glm(y = dat$Disease, x_data = dat[,c("Income", "SBP")])
## [[1]]
## [,1]
## Income 0.9092092
## SBP 0.0106206
##
## [[2]]
## Estimate Std. Error
## (Intercept) "-1.21299612535807" "0.870574563916031"
## Income1 "-0.00331543024980338" "0.254325232647553"
## Income2 "0.11049049150071" "0.25577927092588"
## SBP "0.018343099853456" "0.00717946299007011"
## z value Pr(>|z|)
## (Intercept) "-1.39332824049183" "0.163520554560428"
## Income1 "-0.0130361829036364" "0.989598925526899"
## Income2 "0.431975941993862" "0.66575889862683"
## SBP "2.5549403735107" "0.0106205989530202"
## OR (95% CI)
## (Intercept) "0.297 (0.054 to 1.638)"
## Income1 "0.997 (0.605 to 1.641)"
## Income2 "1.117 (0.676 to 1.844)"
## SBP "1.019 (1.004 to 1.033)"
– 目前套件『survival』是最常用來做存活分析的套件,他同時支援各式存活分析的相關功能。
install.packages("survival")
library("survival")
dat[,"Diabetes"] = as.factor(dat[,"Diabetes"])
dat[,"Income"] = as.factor(dat[,"Income"])
Result5 = coxph(Surv(dat[,"Survival.time"], dat[,"Death"]) ~ ., data = dat[,c("Diabetes", "SBP", "Income")])
Result5
## Call:
## coxph(formula = Surv(dat[, "Survival.time"], dat[, "Death"]) ~
## ., data = dat[, c("Diabetes", "SBP", "Income")])
##
## coef exp(coef) se(coef) z p
## Diabetes1 -0.013309 0.986779 0.128309 -0.104 0.917
## SBP -0.006637 0.993385 0.004487 -1.479 0.139
## Income1 -0.255620 0.774436 0.164798 -1.551 0.121
## Income2 0.107833 1.113862 0.147528 0.731 0.465
##
## Likelihood ratio test=5.95 on 4 df, p=0.203
## n= 883, number of events= 448
## (117 observations deleted due to missingness)
Result6 = summary(Result5)
Result6
## Call:
## coxph(formula = Surv(dat[, "Survival.time"], dat[, "Death"]) ~
## ., data = dat[, c("Diabetes", "SBP", "Income")])
##
## n= 883, number of events= 448
## (117 observations deleted due to missingness)
##
## coef exp(coef) se(coef) z Pr(>|z|)
## Diabetes1 -0.013309 0.986779 0.128309 -0.104 0.917
## SBP -0.006637 0.993385 0.004487 -1.479 0.139
## Income1 -0.255620 0.774436 0.164798 -1.551 0.121
## Income2 0.107833 1.113862 0.147528 0.731 0.465
##
## exp(coef) exp(-coef) lower .95 upper .95
## Diabetes1 0.9868 1.0134 0.7674 1.269
## SBP 0.9934 1.0067 0.9847 1.002
## Income1 0.7744 1.2913 0.5607 1.070
## Income2 1.1139 0.8978 0.8342 1.487
##
## Concordance= 0.535 (se = 0.016 )
## Rsquare= 0.007 (max possible= 0.997 )
## Likelihood ratio test= 5.95 on 4 df, p=0.2
## Wald test = 5.79 on 4 df, p=0.2
## Score (logrank) test = 5.81 on 4 df, p=0.2
– 計算HR的方法與OR類似…
COEF = Result6$coefficients
#計算p value(使用函數「pnorm()」)
my.pvalue = pnorm(abs(COEF[,4]), lower.tail = FALSE) * 2
all.equal(my.pvalue, COEF[,5])
## [1] TRUE
#計算95% CI
m = COEF[,1]
s = COEF[,3]
ci.low = m - qnorm(0.975) * s
ci.up = m + qnorm(0.975) * s
#轉變為HR (與OR完全一樣)
m.HR = exp(m)
ci.low.HR = exp(ci.low)
ci.up.HR = exp(ci.up)
#標準化輸出
m.HR.3 = formatC(m.HR, 3, format = "f")
ci.low.HR.3 = formatC(ci.low.HR, 3, format = "f")
ci.up.HR.3 = formatC(ci.up.HR, 3, format = "f")
paste0(m.HR.3, " (", ci.low.HR.3, " to ", ci.up.HR.3, ")")
## [1] "0.987 (0.767 to 1.269)" "0.993 (0.985 to 1.002)"
## [3] "0.774 (0.561 to 1.070)" "1.114 (0.834 to 1.487)"
COEF = Result6$coefficients
COEF
## coef exp(coef) se(coef) z Pr(>|z|)
## Diabetes1 -0.013308764 0.9867794 0.12830893 -0.1037244 0.9173881
## SBP -0.006637461 0.9933845 0.00448735 -1.4791493 0.1391004
## Income1 -0.255619728 0.7744364 0.16479768 -1.5511124 0.1208747
## Income2 0.107832965 1.1138617 0.14752797 0.7309323 0.4648205
VCOV = Result5$var
VCOV
## [,1] [,2] [,3] [,4]
## [1,] 0.0164631803 -1.936315e-04 -8.050264e-04 4.645830e-04
## [2,] -0.0001936315 2.013631e-05 9.961013e-06 -2.069759e-05
## [3,] -0.0008050264 9.961013e-06 2.715827e-02 2.771585e-03
## [4,] 0.0004645830 -2.069759e-05 2.771585e-03 2.176450e-02
#將第3項與第4項進行整合
variables = 3:4
DF = length(variables)
CHISQ = t(COEF[variables,1]) %*% solve(VCOV[variables,variables]) %*% COEF[variables,1]
PVALUE = pchisq(CHISQ, df = DF, lower.tail = FALSE)
PVALUE
## [,1]
## [1,] 0.1978171
– 注意:Cox比例風險模型不存在截距項,這點要特別注意喔!
## $SUMMARY
## Global p value
## Diabetes "0.917"
## SBP "0.139"
## Income "0.198"
##
## $COEF
## coef se(coef) z Pr(>|z|) HR (95% CI)
## Diabetes1 "-0.013" "0.128" "-0.104" "0.917" "0.987 (0.767 to 1.269)"
## SBP "-0.007" "0.004" "-1.479" "0.139" "0.993 (0.985 to 1.002)"
## Income1 "-0.256" "0.165" "-1.551" "0.121" "0.774 (0.561 to 1.070)"
## Income2 "0.108" "0.148" "0.731" "0.465" "1.114 (0.834 to 1.487)"
library(survival)
my_func = function (y, time = NULL, x_data) {
lvl.y = levels(factor(y))
n.lvl.y = length(lvl.y)
if (is.null(time)) {
if (n.lvl.y < 10) {
if (mean(lvl.y %in% 0:1) == 1) {
Result = glm(y ~ ., family = "binomial", data = x_data)
summary_Result = summary(Result)
COEF = summary_Result$coefficients
m = COEF[,1]
s = COEF[,2]
#計算下界
ci.low = m - qnorm(0.975) * s
#計算上界
ci.up = m + qnorm(0.975) * s
#轉變為OR (這是唯一增加的部分)
m.OR = exp(m)
ci.low.OR = exp(ci.low)
ci.up.OR = exp(ci.up)
#標準化輸出
m.OR.3 = formatC(m.OR, 3, format = "f")
ci.low.OR.3 = formatC(ci.low.OR, 3, format = "f")
ci.up.OR.3 = formatC(ci.up.OR, 3, format = "f")
COEF = cbind(COEF, paste0(m.OR.3, " (", ci.low.OR.3, " to ", ci.up.OR.3, ")"))
colnames(COEF)[5] = "OR (95% CI)"
} else {
cat("y為類別變項時僅能接受0與1的類別變項\n")
}
} else {
Result = glm(y ~ ., family = "gaussian", data = x_data)
summary_Result = summary(Result)
COEF = summary_Result$coefficients
m = COEF[,1]
s = COEF[,2]
#計算下界
ci.low = m - qt(0.975, df = summary_Result$df[2]) * s
#計算上界
ci.up = m + qt(0.975, df = summary_Result$df[2]) * s
m.3 = formatC(m, 3, format = "f")
ci.low.3 = formatC(ci.low, 3, format = "f")
ci.up.3 = formatC(ci.up, 3, format = "f")
COEF = cbind(COEF, paste0(m.3, " (", ci.low.3, " to ", ci.up.3, ")"))
colnames(COEF)[5] = "Slope (95% CI)"
}
} else {
Result = coxph(Surv(time, y) ~ ., data = x_data)
summary_Result = summary(Result)
COEF = summary_Result$coefficients
m = COEF[,1]
s = COEF[,3]
#計算下界
ci.low = m - qnorm(0.975) * s
#計算上界
ci.up = m + qnorm(0.975) * s
#轉變為HR (這是唯一增加的部分)
m.HR = exp(m)
ci.low.HR = exp(ci.low)
ci.up.HR = exp(ci.up)
#標準化輸出
m.HR.3 = formatC(m.HR, 3, format = "f")
ci.low.HR.3 = formatC(ci.low.HR, 3, format = "f")
ci.up.HR.3 = formatC(ci.up.HR, 3, format = "f")
COEF = cbind(COEF, paste0(m.HR.3, " (", ci.low.HR.3, " to ", ci.up.HR.3, ")"))
colnames(COEF)[6] = "HR (95% CI)"
}
summary_result = matrix(0, ncol = 1, nrow = ncol(x_data))
rownames(summary_result) = colnames(x_data)
rownames(summary_result) = colnames(x_data)
num_COEF = summary_Result$coefficients
if (is.null(time)) {
VCOV = summary_Result$cov.scaled
start_val = 2
} else {
VCOV = Result$var
start_val = 1
}
for (i in 1:ncol(x_data)) {
if (class(x_data[,i]) == 'factor') {
end_val = start_val + length(levels(x_data[,i])) - 2
} else {
end_val = start_val
}
variables = start_val:end_val
DF = length(variables)
CHISQ = t(num_COEF[variables,1]) %*% solve(VCOV[variables,variables]) %*% num_COEF[variables,1]
summary_result[i,1] = pchisq(CHISQ, df = DF, lower.tail = FALSE)
start_val = end_val + 1
}
MY_LIST = list(summary_result, COEF)
MY_LIST
}
dat[,"Income"] = as.factor(dat[,"Income"])
my_func.glm(y = dat$Disease, x_data = dat[,c("SBP", "Income")])
## [[1]]
## [,1]
## SBP 0.0106206
## Income 0.9092092
##
## [[2]]
## Estimate Std. Error
## (Intercept) "-1.21299612535807" "0.87057456391603"
## SBP "0.018343099853456" "0.00717946299007011"
## Income1 "-0.00331543024980331" "0.254325232647553"
## Income2 "0.11049049150071" "0.25577927092588"
## z value Pr(>|z|)
## (Intercept) "-1.39332824049183" "0.163520554560428"
## SBP "2.5549403735107" "0.0106205989530202"
## Income1 "-0.0130361829036361" "0.989598925526899"
## Income2 "0.431975941993863" "0.665758898626829"
## OR (95% CI)
## (Intercept) "0.297 (0.054 to 1.638)"
## SBP "1.019 (1.004 to 1.033)"
## Income1 "0.997 (0.605 to 1.641)"
## Income2 "1.117 (0.676 to 1.844)"
my_func(y = dat$Death, time = dat$Survival.time, x_data = dat[,c("Income", "SBP")])
## [[1]]
## [,1]
## Income 0.2242644
## SBP 0.1165677
##
## [[2]]
## coef exp(coef) se(coef)
## Income1 "-0.213973925878094" "0.807369436197337" "0.160557604810633"
## Income2 "0.135928247771198" "1.14559969131463" "0.14487635651984"
## SBP "-0.00656382892311547" "0.993457665946688" "0.00418252926047729"
## z Pr(>|z|) HR (95% CI)
## Income1 "-1.33269256308639" "0.182632714802121" "0.807 (0.589 to 1.106)"
## Income2 "0.938236238378785" "0.348123019048662" "1.146 (0.862 to 1.522)"
## SBP "-1.569344412038" "0.116567709083106" "0.993 (0.985 to 1.002)"
my_func(y = dat$Death, time = dat$Survival.time, x_data = dat[,c("SBP", "Income")])
## [[1]]
## [,1]
## SBP 0.1165677
## Income 0.2242644
##
## [[2]]
## coef exp(coef) se(coef)
## SBP "-0.00656382892311547" "0.993457665946688" "0.00418252926047729"
## Income1 "-0.213973925878094" "0.807369436197337" "0.160557604810633"
## Income2 "0.135928247771198" "1.14559969131463" "0.14487635651984"
## z Pr(>|z|) HR (95% CI)
## SBP "-1.569344412038" "0.116567709083106" "0.993 (0.985 to 1.002)"
## Income1 "-1.33269256308639" "0.182632714802121" "0.807 (0.589 to 1.106)"
## Income2 "0.938236238378785" "0.348123019048662" "1.146 (0.862 to 1.522)"
現在你已經學會了如何在R內進行線性迴歸、邏輯斯迴歸或Cox比例風險模型。
在醫學期刊中,若使用這些統計方法,通常會一次看數個調整前/調整後係數的差異。
請你做一個函數,讓使用者輸入以下資訊,並獲得一張Paper上常看到的表格:
選擇線性迴歸、邏輯斯迴歸或Cox比例風險模型
輸入正確的依變項
輸入有哪些變項想要進行檢定(自變項集合)
輸入需要共同調整的變項(調整變項集合)
– 注意,假設SBP這個變項同時出現在自變項集合中及調整變項集合中,則在調整時不需要再調整它一次
## Independent variable Crude-HR (95% CI) p-value
## 1 "Diabetes" "" "0.321"
## 2 "Diabetes:0" "1.00" ""
## 3 "Diabetes:1" "0.89 (0.71 to 1.12)" "0.321"
## 4 "SBP" "0.99 (0.99 to 1.00)" "0.150"
## 5 "Income" "" "0.212"
## 6 "Income:0" "1.00" ""
## 7 "Income:1" "0.81 (0.60 to 1.11)" "0.189"
## 8 "Income:2" "1.16 (0.87 to 1.53)" "0.310"
## 9 "Education" "" "<0.001"
## 10 "Education:0" "1.00" ""
## 11 "Education:1" "0.75 (0.61 to 0.93)" "0.007"
## 12 "Education:2" "0.39 (0.30 to 0.50)" "<0.001"
## Adj-HR (95% CI)# p-value
## 1 "" "0.785"
## 2 "1.00" ""
## 3 "0.97 (0.75 to 1.24)" "0.785"
## 4 "0.99 (0.99 to 1.00)" "0.191"
## 5 "" "0.198"
## 6 "1.00" ""
## 7 "0.77 (0.56 to 1.07)" "0.121"
## 8 "1.11 (0.83 to 1.49)" "0.465"
## 9 "" "<0.001"
## 10 "1.00" ""
## 11 "0.72 (0.58 to 0.89)" "0.003"
## 12 "0.38 (0.29 to 0.50)" "<0.001"
– 結合之前的兩節課,大部分期刊中常見的統計方法都已經提到了,儘管你可能不知道是怎麼算的,但一定要熟悉結果的解釋。
– 你應該已經學會這些東西了:
描述性統計
單變項統計(類別-類別、類別-連續)
相關性(類別-類別、類別-連續、連續-連續)
多變項統計(GLM、Cox model)